Skip to content

For ONVIF TTS audio proposal, to support device with TTS function#694

Open
Peggy0422 wants to merge 21 commits intodevelopmentfrom
video/TTS-audio-clip
Open

For ONVIF TTS audio proposal, to support device with TTS function#694
Peggy0422 wants to merge 21 commits intodevelopmentfrom
video/TTS-audio-clip

Conversation

@Peggy0422
Copy link
Copy Markdown

To support audio product with TTS function, several operation should be done:

Added TTSCapabilities(Optional): indicate whether the device is capable of TTS function and its corresponding TTS configuration. So add complex type "TTSCapabilities" to the existing complex type "AudioClipCapabilities".
Parameter:

  1. MaxContentLength: indicates the max length of content of a text for device to convert to an audio clip;
  2. TTSLanguage: indicates what language(s) the device supports for TTS function.
  3. TTSVoiceType: indicates voice types that device supports for TTS function.
  1. Add “AddTTSAudioClip”and "AddTTSAudioClipResponse": to send a text, TTS configuration and audio clip configuration to device, device could convert the text to an audio clip based on TTS Configuration. Subsequently, the device will play this audio clip based on configuration.
    Parameter:
  1. Token(Optional): token for the audio clip.
  2. Configuration: audio clip configuration to add, see element "Configuration" .
  3. TTSConfiguration: for TTS audio clip, it specifies the audio content, language and voice type when device play this audio clip.
    Reponse:
  4. Token: unique token of the TTS audio clip to be uploaded.

media2.wsdl

  1. Updated complexType "AudioClipCapabilities" with element "TTSCapabilities"; added complexType "TTSCapabilities" with attributes "MaxContentLength", "TTSLanguage" and "TTSVoiceType"; added simpleType "TTSLanguage" and "TTSVoiceType".
  2. Added elements "AddTTSAudioClip" and "AddTTSAudioClipResponse" for sending a text, TTS configuration and audio clip configuration to the device.
  3. Added complexType "TTSAudio" for element "TTSConfiguration". It includes parameters such as Content, Language, VoiceType.
  4. Added "AddTTSAudioClipRequest" and "AddTTSAudioClipResponse"

media2.xml and documentation

  1. Added detail descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.
  2. Updated audio clip capabilities with TTSCapabilities.

1. Added AddTTSAudioClip request and AddTTSAudioClip response for sending a text and its TTS configuration to the device(1621-1652)(2036-2041)(2418-2422)(2935-2943).
2. Added complex types "TTS Audio" (1465-1485)for TTSConfiguration to support TTS function. It includes parameters Content, Language, VoiceType.
3. updated AudioClipCapabilities with TTSCapabilities(177-181), and added complex types for TTSCapabilities(201-220)to indicate the device supports TTS function and its corresponding configuration. 
complex types TTSCapabilities includes MaxContentLength, TTSLanguage and TTSVoiceType.
4. Added simpleType TTSLanguage(220-231) and TTSVoiceType(232-238).
1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.(2359-2416)
2. updated audio clip Capabilities with TTSCapabilities.(2698-2700)
update code line information for TTS function
correct some editorial errors
Updated the description of the AddTTSAudioClip operation to clarify the parameters and response. Updated the description of TTScapabilities.
TTS audio clip pull request was firstly created as number 668
Updated TTS configuration description and added TTSCapabilities entry.
@sujithhanwha
Copy link
Copy Markdown
Contributor

OLD PR for reference
#668

@ocampana-videotec ocampana-videotec added this to the 26.06 milestone Dec 4, 2025
doc/Media2.xml Outdated
</varlistentry>
</variablelist>
<para></para>
<para><emphasis role="bold">Note:</emphasis> Audio clip uploads to the device can fail in the following scenarios, and a specific HTTP error code should be returned to the client when an upload fails.</para>
Copy link
Copy Markdown
Contributor

@venki5685 venki5685 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note seems not applicable for TTSAudioClip

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not for TTS, I will delete it.

delete inappropriate note for OPTION AddTTSAudioClip
Copy link
Copy Markdown
Contributor

@johado johado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small textual comments.

doc/Media2.xml Outdated
<title>AddTTSAudioClip</title>
<para>This operation adds a text, audio clip configuration and TTS configuration to the device, for device converting the text to an audio clip based on the TTS configuration.
The response to the command includes a unique token for this converted audio clip.
If the device is unable to support language specified in the TTS configuration, the associated configuration will deleted from the device.</para>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "be" to "will be deleted"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it.

doc/Media2.xml Outdated
<term>response</term>
<listitem>
<para role="param">Token - [tt:ReferenceToken]</para>
<para role="text">Unique token of the TTS audio clip to be uploaded.</para>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "to be uploaded" to "that was added" ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your advise, we consider using the word "assign", which should be more precise.

doc/Media2.xml Outdated
</varlistentry>
<varlistentry>
<term>TTSCapabilities</term>
<listitem><para>Indicates device supports TTS function and TTS configuration.See tr2: TTSCapabilities.</para></listitem>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space after .: "..configuration. See tr2:..."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you.

</xs:element>
<xs:element name="Language" type="xs:string">
<xs:annotation>
<xs:documentation>Language for the TTS audio clip playback. See tr2: TTSLanguage. </xs:documentation>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "See tr2:TTSLanguage and TTSCapabilities." ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your option. TTSLanguage is an attribute within TTSCapability already. If we want to point out that the language for TTS audio clip playback must be one of the languages that supported by the device, we could consider revise the explanation to clearly indicate this, such as: "The language which is supported and used for TTS audio clip playback. "

</xs:element>
<xs:element name="VoiceType" type="xs:string">
<xs:annotation>
<xs:documentation>The voice type for the TTS audio clip playback. See tr2: TTSVoiceType.</xs:documentation>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "See tr2:TTSVoiceType and TTSCapabilities." ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to update the explanation for TTSVoiceType, just like commit for TTSLanguage

<xs:sequence>
<xs:element name="Token" type="tt:ReferenceToken">
<xs:annotation>
<xs:documentation>Unique token of the TTS audio clip to be uploaded.</xs:documentation>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "to be uploaded" to something more relevant. converted, generated, ..?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for bring it up, yes, we consider changing it and using the word "assign", which should be more precise.

<xs:anyAttribute processContents="lax"/>
</xs:complexType>
<!--===============TTS Language================-->
<xs:simpleType name="TTSLanguage">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is reasoning behind decision of languages in below list?

Copy link
Copy Markdown

@robberos robberos Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any standard for offical language names that can be refered to?

TTSCapabilities and TTSAudio uses open strings, so enum should provide a good pattern.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your comments! We truly appreciate your input and have been carefully considering how to best define these general concepts. Your mention of ISO international standards was particularly helpful and guided our further research. We also looked into RFC 5646 for language representation across countries. So we would like to use alpha-2 codes to represent languages and countries, as recommended in ISO 639-1 and ISO 3166-1. For languages with regional variations, we plan to adopt the language-country format (e.g., en-US, zh-CN). Thank you again for your feedback.

doc/Media2.xml Outdated
</itemizedlist>
</section>
</section>
<section xml:id="section_wvd_dzg_rye">
Copy link
Copy Markdown

@robberos robberos Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id should be unique in xml, right? seems as it is a copy of SetAudioClip section below

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for the suggestion. I have revised it accordingly.

See <a href="https://www.iso.org/obp/ui/">ISO Country Codes</a>.
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to make an explicit restriction here and not just defined it as a string? If we go this way, whenever we need to add a language we need to update the WSDL file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your comment! Yes, this is an important issue we should considered.
Previously, we defined languages using string format and listed commonly used or potentially needed languages. However, this approach does introduce a maintenance burden—as you pointed out, each new language addition would require updating the WSDL file.To address this, we now directly reference ISO-standard language codes via strings. Users may refer to the official ISO codes for specific needs, while the WSDL only defines the reference rules. The examples in TTSLanguage are provided for convenience. I hope this clarifies the approach. Thank you again for your comment!

Added note about enumeration values being illustrative in TTSLanguage.
Revise the description of language definition in TTScapability and TTSAudio
Copy link
Copy Markdown
Contributor

@kieran242 kieran242 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of requested changes

<xs:annotation>
<xs:documentation>
List of supported languages. Uses ISO 639-1 alpha-2 language codes, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>.
Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link supplied "https://www.iso.org/obp/ui/" to reference ISO 3166-1 does not direct you to the standard instead it takes you to the following page: Can we fix this reference please :)

Image

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll replace the link with the direct reference immediately (https://www.iso.org/obp/ui/#search/code/). Thank you for pointing this out.

<xs:documentation>
The language that is supported by the device and used for TTS audio clip playback.
Uses ISO 639-1 alpha-2 language codes for definition, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>.
Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my previous 2 comments above. Please correct here also.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I have already corrected this section as well. Thank you for the reminder.

<xs:annotation>
<xs:documentation>
List of supported languages. Uses ISO 639-1 alpha-2 language codes, such as"en" for English. See <a href="https://www.loc.gov/standards/iso639-2/php/English_list.php">Codes for the Representation of Names of Languages</a>.
Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166 Country Codes</a>.
Optionally combined with ISO 3166-1 alpha-2 country codes using the "language-country" format to specify regional variations, such as"en-US" for American English. For country codes, see <a href="https://www.iso.org/obp/ui/">ISO 3166-1 Country Codes</a>.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes it more accurate. I've updated the relevant section. Thank you for your advice! :)

@kieran242
Copy link
Copy Markdown
Contributor

kieran242 commented Mar 31, 2026

@Peggy0422 When you use the "AddTTSAudioClip" api is the "TTSConfiguration" stored on the device with the Audio Clip or just used to create the audio clip dynamically? If stored on the device then there is no way to update it or identify it.

Further when you request "GetAudioClips" there does not seem to be a way to identify which is an uploaded Audio clip and a TTS Audio Clip other than the Audio Clip token returned to the user from the API. This would make updating or deleting an Audio TTS Clip difficult without keeping a track of the tokens and your "TTSConfiguration" in some way.

@Peggy0422
Copy link
Copy Markdown
Author

@kieran242 Thank you very much for your questions. Regarding the "AddTTSAudioClip" API, the "TTSConfiguration" is used solely for generating the audio clip and is not stored on the device.

Typically, adding a TTS audio clip is the first step to enable playback on the device. When a client uses AddTTSAudioClip, the device returns a token via the AddTTSAudioClipResponse that corresponds to the generated TTS audio clip. This token serves as a unique identifier for subsequent operations, such as Get, Set or Delete.
Hope this addresses your concerns, thank you.

update the reference link for country code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants